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Optimal  income  taxation:  an  example  with  a  U-shaped  pattern  of 
optimal  marginal  tax  rates 

Peter  Diamond1 

Income  effects  complicate  tax  analysis.   In  the  presence  of 
distorting  taxes,  income  effects  imply  that  lump-sum  taxes  have 
efficiency  effects.   Thus  it  is  instructive  to  examine  the  optimal 
income  tax  in  the  Mirrlees  (1971)  model  for  the  case  of  no  income 
effects2,  the  case  where  everyone  has  the  utility  function 

(1)      u(x,y)  =  x  +  v(l-y) , 

where  x  is  consumption,  y  is  labor  (in  percentage  terms) ,  and  v  is 
assumed  to  be  strictly  concave.3   The  only  difference  assumed 
across  individuals  is  a  difference  in  skills,  with  an  individual  of 
skill  n  having  a  marginal  product  egual  to  n.   The  optimal  income 
taxation  problem  is  the  maximization  of  the  integral  over  the 
population  of  a  concave  function  of  utilities,  subject  to  an 
aggregate  budget  constraint  and  subject  to  the  constraint  that 
individuals  optimize  in  their  choice  of  labor  supply  given  the 
relationship  between  work  and  after-tax  income. 

If,  above  some  income  level,  the  distribution  of  skills  is  the 
exponential  distribution  and  the  utility-of-leisure  function,  v(l-y) , 
is  the  logarithm,  then  the  marginal  tax  rate  is  increasing.   Other 
combinations  of  utility  function  and  skill  distribution  yielding  the 
same  conclusion  are  also  given,  including  the  combination  of  a 
constant  elasticity  labor  supply  and  the  Pareto  distribution  of 
skills . 

With  either  a  logarithmic  utility-of-leisure  or  a  constant 
elasticity  of  labor  supply,  the  marginal  tax  rate  is  falling  where 
the  density  of  skills  is  rising  and  the  income  level  is  high  enough 
to  want  to  transfer  resources  from  that  skill  level  to  the 
government  budget.   Combining  this  result  with  that  above,  we 
have  examples  with  a  U-shaped  pattern  of  optimal  marginal  taxes 


1  I  am  grateful  to  Jon  Gruber  for  suggesting  the  inclusion  of  data 
in  this  paper  and  providing  me  with  the  figures  presented  below, 
and  to  Jay  Wilson  for  helpful  suggestions.   I  am  also  grateful  to 
the  National  Science  Foundation  for  research  support. 

2  This  assumption  is  probably  more  appropriate  at  high  income 
levels,  since  people  at  the  top  of  the  income  distribution  are  likely 
to  leave  large  estates  and  may  not  adjust  their  earnings  to  the 
exact  level  of  estate. 

3  For  a  discussion  of  this  case  with  a  constant  elasticity  of  labor 
supply,  see  Atkinson  (1990).   The  complementary  case  where 
utility  is  linear  in  leisure  has  been  studied;  see  Lollivier  and 
Rochet  (1983) . 
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if  the  density  of  skills  first  rises  and  then  falls.   This  argument 
does  not  refer  to  the  lowest  skill  levels,  only  to  the  levels  of  skills 
from  which  it  is  desired  to  transfer  income. 

The  argument  for  a  U-shaped  pattern  of  marginal  tax  rates 
assumes  that  there  is  no  upper  bound  to  skill  levels.   As  is  well 
known,  in  the  presence  of  an  upper  bound  the  marginal  rate  on 
the  highest  skill  should  be  zero.4   This  analysis  shows  that  the 
shape  of  the  distribution  near  the  top  is  essential  for  determining 
the  pattern  of  taxation  near  the  top.   There  is  no  need  for  tax 
rates  to  decline  slowly  toward  zero  as  one  approaches  the  absolute 
top  of  the  skill  distribution. 

The  absence  of  income  effects  also  allows  an  intuitive  grasp 
of  the  factors  that  determine  the  optimal  tax  structure.   Increasing 
the  marginal  tax  rate  affecting  some  skill  level  involves  an 
increase  in  the  deadweight  burden  for  people  at  this  skill  level. 
Thus,  the  optimal  marginal  tax  rate  at  some  income  level  depends 
on  the  elasticity  of  labor  supply  at  that  income  level,  since  this  is 
important  for  marginal  distortions.   Increasing  the  marginal  tax 
rate  also  transfers  income  from  all  individuals  with  higher  skills 
to  the  government,  without  changing  the  distortions  of  their  labor 
supplies.   The  weights  on  these  two  elements  depend  on  the  ratio 
of  individuals  with  skills  above  this  level  to  individuals  with 
skills  at  this  level.   This  intuition  is  displayed  in  the  equations 
below,  where  the  first  order  condition  for  the  optimal  income  tax 
is  written  with  a  product  of  these  three  terms. 

Model 

We  assume  that  a  person  of  type  n  has  a  marginal  product  n 
with  the  distribution  of  skills  written  as  F(n).   Thus,  the  ratio  of 
people  with  higher  skills  to  those  with  given  skills  is  (l-F(n) ) /f (n) , 
where  f  is  the  density  associated  with  the  distribution  F. 
Moreover,  it  is  assumed  that  the  social  welfare  function  is  the 
integral  of  a  strictly  concave  function  of  utility,  G(u) ,  with  utility 
written  in  the  form  given  in  equation  (1)  and  G  independent  of  n. 
Thus  the  objective  function  to  be  maximized  is 

(2)  /c(x(n)  +  v(l-y(n)))f (n)dn 
so 

with  (x(n) ,  y(n))  chosen  optimally  by  individuals  of  type  n  given 
the  tax  function,  T,  that  determines  individual  budget  constraints, 

(3)  x(n)  =  ny(n)  -  T(ny(n)) . 

To  examine  the  optimal  tax,  we  start  with  the  optimal  tax 
formula  for  the  case  of  the  additive  utility  function,  equation  (21) 
in  Mirrlees,  specialized  for  the  utility  function  linear  in 
consumption. 


4  Sadka  (1976),  Seade  (1977) 


-2- 


(4)  p(n  -  v')f  =  [(v'-yv")/n][/  (p-G')dF]. 

where  p  is  the  Lagrange  multiplier  on  the  government's  budget 
constraint  and  the  functions  v  and  G  are  evaluated  at  the 
appropriate  labor  supplies.   We  note  that  p  is  equal  to  the  average 
of  G'  over  the  entire  population  since  consumption  can  be 
transferred  one-for-one  between  the  government  and  the 
households,  provided  the  tranfers  are  equal  per  capita.   Thus,  the 
integral  in  (4)  is  first  increasing  and  then  decreasing  as  a  function 
of  n. 

It  is  convenient  to  use  the  marginal  condition  for  individual 
choice, 

(5)  V  =  (l-T')n, 

where  T"  is  the  marginal  tax  rate,  and  the  definition  of  the 
elasticity  of  labor  supply 

(6)  e  =  -v'/yv". 
Then,  we  can  write  (4)  as 

(7)  T'/(l-T')  =  [(e-1+l)/n][/(p-G')dF]/[pf]. 

-A 

Multiplying  and  dividing  (7)  by  (1-F)  to  turn  the  integral  into  an 
average  term,  we  can  rewrite  the  first  order  condition  as 

(8)  T'/(l-T')  =  [(e-i+lJ/nH  J  (p-G')dF/p(l-F)][(l-F)/f]. 
Using  (5)  again,  we  can  write  it  as 

(9)  T'/(1"T')2  =  [(v'-yv")/(v')2][f  (p-G' ) dF/p ( 1-F) ] [ ( 1-F) /f ] . 

Rising  Marginal  Tax  Rates 

We  are  now  prepared  to  examine  the  pattern  of  marginal  tax 
rates  given  by  these  optimality  conditions,  considering  the  levels 
of  skills  at  which  labor  supply  is  positive.   Equation  (8)  shows  the 
three  terms  mentioned  above.   The  second  term  reflects  the  gain 
from  transferring  resources  from  workers  with  skill  levels  above  n 
to  the  government,  averaged  over  the  people  paying  the  higher 
taxes.   The  first  term  reflects  the  distortion  associated  with  such  a 
change,  recognizing  the  fact  that  a  one  dollar  increase  in  taxes 
paid  requires  a  smaller  increase  in  tax  rates  the  larger  the  level  of 
skill  at  the  income  level  at  which  taxes  are  increased.   The  third 
term  gives  the  ratio  of  individuals  above  this  skill  level  to 
individuals  at  this  skill  level. 

If,  above  some  income  level,  the  distribution  of  skills  is  the 
exponential  distribution,  then  the  third  term,  the  ratio  (l-F)/f,  is  a 
constant,  independent  of  skill  level.   If  the  utility-of-leisure  v(l-y) 
is  the  logarithm,  then,  where  labor  supply  is  positive,  the  first 
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term  in  (9),  [ (v'-yv") / (v' ) 2 ] ,  is  equal  to  1  independent  of  skill  level. 
With  these  two  assumptions,  the  marginal  tax  rate  will  vary  with 
the  remaining  term.   This  yields  an  increasing  marginal  tax  rate 
since  the  social  marginal  utility  of  income  is  decreasing  with 
income . 

Thus,  over  the  range  where  the  distribution  of  skills  is 
exponential,  marginal  tax  rates  increase  if  the  utility-of -leisure 
function  is  logarithmic.   The  same  conclusion  would  hold  for  a 
distribution  that  goes  to  zero  more  slowly  than  the  exponential,  so 
that  the  third  term  in  (9)  was  increasing.5   In  addition,  the  result 
follows  for  any  utility  function  that  showed  the  first  term 
increasing  as  one  moved  up  the  skill  distribution  (and  so  the  level 
of  labor) . 

Another  combination  of  assumptions  yielding  the  same 
conclusion  can  be  found  by  moving  n,  in  equation  (8) ,  from  the 
first  term  to  the  third  term.   Then,  we  have  rising  marginal  rates 
where  the  elasticity  of  labor  supply  is  constant  or  falling  and  (1- 
F) /nf  is  constant  or  rising.6  The  latter  holds  for  the  Pareto 
distribution,  where  the  density  is  proportional  to  l/n1+a  for  a>0. 

Asymptotic  Marginal  Rates 

With  a  known  finite  top  to  the  skills  distribution,  the  optimal 
marginal  tax  rate  should  be  zero  at  the  top  of  the  income 
distribution.   This  need  not  imply  that  rates  decline  until  very 
close  to  the  top.7   Thus  it  has  been  natural  to  consider  the 
behavior  of  the  optimal  marginal  tax  rate  satisfying  the  conditions 
above  as  skills  rise  without  limit.   If  the  welfare  weight  of  those 
at  the  top  tends  to  zero  as  skill  rises  without  limit,  then  the  tax 
rate  tends  to  the  revenue  maximizing  rate.   This  can  be  derived 
from  the  equations  above  after  noting  that  if  G'  goes  to  zero  as  n 
rises  without  limit,  then  the  middle  term  in  (9)  goes  to  1.   If  the 
utility-of-leisure  function  is  logarithmic,  then  the  first  term  in  (9) 
is  also  equal  to  1  and  the  tax  rate  converges  to  the  solution  to 

(10)  T'/(1"T')2  =  (l-F)/f. 

If  the  distribution  of  skills  is  exponential  toward  the  top,  (l-F)/f 
is  equal  to  1/b,  where  b  is  the  coefficient  in  the  exponential 
distribution.   Thus  the  tax  rate  converges  to 

(11)  1-T'  =  ((b2  +  4b)1/2  -  b)/2, 


5  The  logarithmic  utility  function  has  the  first  term  in  (9)  equal 
to  1.   Varying  the  value  of  the  constant  equalling  the  first  term 
generates  utility  functions  klog(l-y)  for  constants  k,  utility 
functions  that  do  not  differ  in  this  implication. 

6  Hausman  has  estimated  the  elasticity  of  labor  supply  by  quintile 
and  finds. 

7  This  point  is  also  made  by  the  numerical  calculations  in 
Tuomala  (1984) . 
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For  values  of  (l-F)/f,  that  is,  1/b,  of  5,  10,  and  15,  (see  Tables  1 
and  2,  below),  the  optimal  marginal  tax  rate  tends  to  64%,  73%, 
and  77%. 

One  can  do  a  similar  calculation  for  the  Pareto  distribution. 
Assuming  a  constant  elasticity  of  labor  supply,  e,  and  a  Pareto 
distribution  with  coefficient  a,  so  that  (l-F)/(nf)  equals  1/a,  (8) 
becomes 

(12)  T'/(l-T')  =  (e_1+l)/a. 

Thus,  the  the  optimal  tax  rate  coverges  to 

(13)  T'  =  (e-1+l) / (a+e-1+l) . 

Identifying  skill  with  the  wage  yields  an  elasticity  based  on 
adjusting  hours  of  labor  supply;  identifying  skill  with  an 
underlying  ability  implies  a  larger  elasticity  since  education  is 
also  variable.   For  values  of  the  elasticity  of  0.2  and  of  a  of  0.5, 
the  asymptotic  optimal  tax  rate  is  92%.   Raising  a  to  1.5  lowers  the 
tax  rate  to  80%.°   Raising  the  elasticity  to  0.5  with  a  at  1.5  lowers 
the  tax  rate  to  67%. 

Falling  Marginal  Rates 

It  is  also  interesting  to  consider  the  case  of  a  distribution  of 
skills  with  a  density  that  first  rises  and  then  falls.   For  this 
analysis  we  consider  only  the  levels  of  skills  at  which  G'  is  less 
than  p,  so  that  it  is  desirable  to  transfer  resources  away  from  this 
skill  level  (if  it  could  be  done  costlessly) .   We  assume  that  G' 
equals  p  at  a  skill  level  that  is  below  the  skill  level  that  is  the 
mode  of  the  distribution  of  skills.   It  is  now  convenient  to  cancel 
the  two  (1-F)  terms  in  equations  (8)  and  (9).   When  the  density  is 
rising  and  p  exceeds  G' ,  the  latter  two  terms  in  equations  (8)  and 
(9)  are  declining.   Thus  with  either  a  constant  elasticity  of  labor 
supply  or  a  logarithmic  utility-of-leisure,  the  marginal  tax  rate 
must  be  falling. 

Similarly,  one  can  compare  tax  rates  at  two  income  levels  on 
either  side  of  the  maximum  in  the  density,  and  such  that  the 
density  is  equal  at  the  two  points.   Provided  that  G'  is  less  than  p 
at  the  lower  of  the  two  skill  levels  being  compared,  the  marginal 
tax  rate  would  be  higher  at  the  lower  income  level  with  either  a 
constant  elasticity  of  labor  supply  or  a  logarithmic  utility-of- 
leisure. 

Thus  we  can  see  the  pattern  of  tax  rates  when  the  density  of 
skills  is  first  rising  and  then  falling  (and  such  that  the  workers 


8  Feenberg  and  Poterba  (1992)  use  tax  data  to  estimate  the  Pareto 
coefficients  each  year  from  1951  to  1990  for  the  distribution  of 
total  incomes  for  the  top  0.5%  of  the  population.   They   find 
values  of  a  that  vary  between  0.54  and  1.46. 
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with  the  modal  skill  work  and  have  G'  less  than  p  in  equilibrium) . 
With  the  logarithmic  utility  and  an  exponential  distribution  of 
skills  where  the  density  is  falling  the  pattern  of  marginal  tax  rates 
is  U-shaped  above  the  level  where  G'  equals  p,  with  the  minimum 
of  marginal  rates  occurring  at  the  maximum  of  the  density  of 
skills.   Since  there  is  probably  a  positive  density  of  skills  at  the 
level  of  skills  where  labor  supply  is  zero,  it  does  not  follow  from 
this  argument  that  marginal  tax  rates  are  higher  at  the  bottom  of 
the  income  distribution  than  at  the  top.   The  same  conclusion 
about  a  U-shaped  pattern  holds  with  a  constant  elasticity  of  labor 
supply  and  a  Pareto  distribution  for  skills  where  the  distribution 
is  falling. 

Data 

While  a  careful  attempt  to  fit  this  model  to  available  data  is 
beyond  the  scope  of  this  note,  it  does  seem  interesting  to  examine 
the  distribution  of  wages.   For  this  purpose,  calculations  have 
been  done  using  the  March,  1992  CPS.   This  survey  asked 
individuals  for  annual  earnings  in  1991,  as  well  as  weeks  worked 
and  typical  hours  per  week.   From  these  numbers  one  can 
calculate  an  implied  average  wage.   Figure  1  presents  the 
distribution  of  wages  for  males  aged  21  to  62. 9   In  order  to  have  a 
visible  scale,  (and  to  ignore  very  low  earnings)  the  figure  shows 
wage  rates  from  $1  to  $100,  with  intervals  of  size  $2  and  is  based 
on  a  sample  size  of  approximately  34,000.   Approximately  17%  of 
the  sample  report  lower  wages  or  no  work;  less  than  0.1%  of  the 
sample  report  higher  wages.   The  figure  shows  the  single  peaked 
distribution  one  would  have  expected. 

Table  1  reports  calculations  from  the  same  data  set.   The 
columns  show  the  mean  wage  per  cell;  the  number  of  observations 
per  cell,  adjusted  by  cell  size  in  order  to  be  proportional  to  the 
density;  the  number  of  observations  with  higher  wages;  the  ratio 
(l-F)/f;  and  the  ratio  (l-F)/nf,  where  n  is  the  wage.   In  order  to 
have  reasonable  cell  sizes,  the  wage  intervals  are  first  $0.50,  but 
are  expanded  above  a  wage  of  $2  6.   The  table  shows  sharply 
falling  values  of  (l-F)/f  through  the  range  where  the  density  is 
first  rising  and  then  roughly  flat,  that  is,  up  to  a  wage  of  roughly 
$13,  a  little  below  the  mean  of  $13.7.   Beyond  this  point  (l-F)/f  is 
roughly  constant.   This  implies  a  downward  trend  in  (l-F)/nf. 

With  a  longer  time  horizon  than  one  year,  one  would  consider 
education  to  be  an  endogenous  variable  somewhat  responsive  to 
tax  incentives.   One  might  also  be  interested  in  the  distribution  of 
skills  within  a  cohort.   Thus  a  further  calculation  was  done  by 
regressing  the  log  of  the  wage  on  education,  age  and  age  squared, 
and  plotting  the  exponentiated  residuals.   Figure  2  shows  the 
density  of  skills  from  this  calculation.   In  Table  2  are  the  same 
calculations  for  this  distribution  as  shown  in  Table  1  for  the 


9  No  attempt  was  made  to  consider  both  earners  in  a  two-earner 
family  or  wages  of  single  females. 
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distribution  of  wages.   This  distribution  shows  a  fatter  tail  than 
the  distribution  of  wages,  with  (l-F)/f  rising  and  (l-F)/nf  roughly 
constant  for  the  top  15%  of  the  skill  distribution. 

There  is  not  a  simple  route  between  the  Mirrlees  model  and 
policy  implications  for  annual  income  taxes  levied  on  families  and 
covering  both  capital  and  labor  incomes.   The  assumption  of  a 
zero  income  elasticity  of  labor  supply  and  the  limited  information 
on  the  pattern  of  wage  elasticities  of  supply  by  skill  level  would 
limit  inferences  even  if  there  were  a  simple  route.   Nevertheless 
there  are  some  lessons  from  the  analysis.   The  sharp  fall  in  (l-F)/f 
as  skills  approach  the  mode  of  the  skill  distribution  from  below 
seems  highly  relevant,  especially  if  one  wants  to  redistribute  from 
people  near  the  mode,  rather  than  to  them.   This  finding  on  the 
shape  of  an  optimal  (negative)  income  tax  seems  relevant  in 
thinking  about  the  phaseout  of  the  earned  income  tax  credit,  and, 
possibly,  welfare  reform.   This  model  confirms  the  implication  of 
Mirrlees'  calculations  that  the  optimality  of  a  zero  tax  rate  at  the 
highest  income  level  is  not  a  finding  that  sheds  much  light  on 
optimal  taxes,  especially  in  the  absence  of  knowledge  of  exactly 
where  the  top  is.   That  is,  if  one  replaced  an  unbounded 
distribution  of  skills  by  a  bounded  one  with  the  same  distribution 
up  to  some  level  and  a  (small)  atom  of  skills  at  the  highest  level, 
the  result  of  rising  marginal  tax  rates  continues  to  hold  until  the 
top  itself  is  reached.   Third,  the  sensitivity  of  the  pattern  of 
marginal  rates  to  the  measure  of  skill  seems  potentially  relevant, 
although  different  formulations  of  skill  will  be  associated  with 
different  estimates  of  the  elasticities  of  labor  supply.   Apart  from 
direct  relevance  for  setting  taxes,  consideration  of  this  case 
clarifies  some  of  the  elements  that  matter  for  understanding 
optimal  taxation,  particularly  the  role  of  the  shape  of  the 
distribution  of  skills. 
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